Search CORE

66 research outputs found

Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs

Author: Brockman Jay
Sterling Thomas
Upchurch Ed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2004
Field of study

A major trend in high performance computer architecture over the last two decades is the migration of memory in the form of high speed caches onto the microprocessor semiconductor die. Where temporal locality in the computation is high, caches prove very effective at hiding memory access latency and contention for communication resources. However where temporal locality is absent, caches may exhibit low hit rates resulting in poor operational efficiency. Vector computing exploiting pipelined arithmetic units and memory access address this challenge for certain forms of data access patterns, for example involving long contiguous data sets exhibiting high spatial locality. But for many advanced applications for science, technology, and national security at least some data access patterns are not consistent to the restricted forms well handled by either caches or vector processing. An important alternative is the reverse strategy; that of migrating logic in to the main memory (DRAM) and performing those operations directly on the data stored there. Processor in Memory (PIM) architecture has advanced to the point where it may fill this role and provide an important new mechanism for improving performance and efficiency of future supercomputers for a broad range of applications. One important project considering both the role of PIM in supercomputer architecture and the design of such PIM components is the Cray Cascade Project sponsored by the DARPA High Productivity Computing Program. Cascade is a Petaflops scale computer targeted for deployment at the end of the decade that merges the raw speed of an advanced custom vector architecture with the high memory bandwidth processing delivered by an innovative class of PIM architecture. The work represented here was performed under the Cascade project to explore critical design space issues that will determine the value of PIM in supercomputers and contribute to the optimization of its design. But this work also has strong relevance to hybrid systems comprising a combination of conventional microprocessors and advanced PIM based intelligent main memory

Caltech Authors

Initial Kernel Timing Using a Simple PIM Performance Model

Author: Block Gary L.
Brockman Jay B.
Callahan David
Katz Daniel S.
Springer Paul L.
Sterling Thomas
Publication venue
Publication date
Field of study

This presentation will describe some initial results of paper-and-pencil studies of 4 or 5 application kernels applied to a processor-in-memory (PIM) system roughly similar to the Cascade Lightweight Processor (LWP). The application kernels are: * Linked list traversal * Sun of leaf nodes on a tree * Bitonic sort * Vector sum * Gaussian elimination The intent of this work is to guide and validate work on the Cascade project in the areas of compilers, simulators, and languages. We will first discuss the generic PIM structure. Then, we will explain the concepts needed to program a parallel PIM system (locality, threads, parcels). Next, we will present a simple PIM performance model that will be used in the remainder of the presentation. For each kernel, we will then present a set of codes, including codes for a single PIM node, and codes for multiple PIM nodes that move data to threads and move threads to data. These codes are written at a fairly low level, between assembly and C, but much closer to C than to assembly. For each code, we will present some hand-drafted timing forecasts, based on the simple PIM performance model. Finally, we will conclude by discussing what we have learned from this work, including what programming styles seem to work best, from the point-of-view of both expressiveness and performance

NASA Technical Reports Server

Recommended from our members

Science To Support DOE Site Cleanup: The Pacific Northwest National Laboratory Environmental Management Science Program Awards

Author: Bredt Paul R
Brockman Fred J
Grate Jay W
Hess Nancy J
Meyer Philip D
Murray Christopher J
Pfund David M
Su Yali
Thornton Edward C
Weber William J
Zachara John M
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 19/06/2001
Field of study

Pacific Northwest National Laboratory (PNNL) was awarded ten Environmental Management Science Program (EMSP) research grants in fiscal year 1996, six in fiscal year 1997, nine in fiscal year 1998, seven in fiscal year 1999, and five in fiscal year 2000. All of the fiscal year 1996 award projects have published final reports. The 1997 and 1998 award projects have been completed or are nearing completion. Final reports for these awards will be published, so their annual updates will not be included in this document. This section summarizes how each of the 1999 and 2000 grants address significant U.S. Department of Energy (DOE) cleanup issues, including those at the Hanford Site. The technical progress made to date in each of these research projects is addressed in more detail in the individual progress reports contained in this document. The 1999 and 2000 EMSP awards at PNNL are focused primarily in two areas: Tank Waste Remediation, and Soil and Groundwater Cleanup

UNT Digital Library

Short Conduction Delays Cause Inhibition Rather than Excitation to Favor Synchrony in Hybrid Neuronal Networks of the Entorhinal Cortex

How stable synchrony in neuronal networks is sustained in the presence of conduction delays is an open question. The Dynamic Clamp was used to measure phase resetting curves (PRCs) for entorhinal cortical cells, and then to construct networks of two such neurons. PRCs were in general Type I (all advances or all delays) or weakly type II with a small region at early phases with the opposite type of resetting. We used previously developed theoretical methods based on PRCs under the assumption of pulsatile coupling to predict the delays that synchronize these hybrid circuits. For excitatory coupling, synchrony was predicted and observed only with no delay and for delays greater than half a network period that cause each neuron to receive an input late in its firing cycle and almost immediately fire an action potential. Synchronization for these long delays was surprisingly tight and robust to the noise and heterogeneity inherent in a biological system. In contrast to excitatory coupling, inhibitory coupling led to antiphase for no delay, very short delays and delays close to a network period, but to near-synchrony for a wide range of relatively short delays. PRC-based methods show that conduction delays can stabilize synchrony in several ways, including neutralizing a discontinuity introduced by strong inhibition, favoring synchrony in the case of noisy bistability, and avoiding an initial destabilizing region of a weakly type II PRC. PRCs can identify optimal conduction delays favoring synchronization at a given frequency, and also predict robustness to noise and heterogeneity

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Louisiana State University

FigShare

A Multidisciplinary Optimization Approach to Integrated Circuit Design

Author: Jay B. Brockman
John Renaud
Lokanathan Jay
Publication venue
Publication date
Field of study

In this paper, we investigate potential applications of multidisciplinary design optimization (MDO) algorithms to integrated circuit design. These algorithms use global sensitivity equations as a tool that provides for a temporary decoupling of the constituent subsystems of a complex system so that designers with expertise in different disciplines can design the subsystems. We develop two example design problems, one with hierarchically coupled subsystems and one with non-hierarchically coupled subsystems, to which we apply multidisciplinary optimization: (1) An example of joint process-circuit optimization, in which a semiconductor fabrication process is tuned concurrently with the design of a CMOS circuit, and (2) an example of circuit -thermal design, in which cell placement and cell design are done simultaneously with regard to the temperature effects on circuit performances. We discuss how this approach enables efficient and concurrent optimization of integrated circuits. 1.0 Intr..

CiteSeerX

Measurement and Analysis of Sequential Design Processes

Author: Eric W. Johnson
Jay B. Brockman
Publication venue
Publication date: 01/01/1998
Field of study

this paper we describe the development of an analytical approach for evaluating sequential design process completion time and for determining the sensitivities of design time with respect to individual task durations and transition probabilities. Techniques are also detailed for collecting process metadata and calibrating a design process model. Example applications illustrate the use of the methodology in analyzing and improving software and hardware design processes. Categories and Subject Descriptors: J.6 [Computer Applications]: Computer-Aided Engineering - computer-aided design; K.3.2 [Computers and Education]: Computer and Information Science Education - self-assessmen

CiteSeerX